Search results for "Natural Language Processing"
showing 10 items of 413 documents
UML Style Graphical Notation and Editor for OWL 2
2010
OWL is becoming the most widely used knowledge representation language. It has several textual notations but no standard graphical notation apart from verbose ODM UML. We propose an extension to UML class diagrams (heavyweight extension) that allows a compact OWL visualization. The compactness is achieved through the native power of UML class diagrams extended with optional Manchester encoding for class expressions thus largely eliminating the need for explicit anonymous class visualization. To use UML class diagram notation we had to modify its semantics to support Open World Assumption that is central to OWL. We have implemented the proposed compact visualization for OWL 2 in a UML style …
Bayesian Modelling of Confusability of Phoneme-Grapheme Connections
2007
Deficiencies in the ability to map letters to sounds are currently considered to be the most likely early signs of dyslexia. This has motivated the use of Literate, a computer game for training this skill, in several Finnish schools and households as a tool in the early prevention of reading disability. In this paper, we present a Bayesian model that uses a student's performance in a game like Literate to infer which phoneme-grapheme connections student currently confuses with each other. This information can be used to adapt the game to a particular student's skills as well as to provide information about the student's learning progress to their parents and teachers. We apply our model to …
Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation
2020
Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems’ training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems’ performance drops by…
SisHiTra : A Hybrid Machine Translation System from Spanish to Catalan
2004
In the current European scenario, characterized by the coexistence of communities writing and speaking a great variety of languages, machine translation has become a technology of capital importance. In areas of Spain and of other countries, coofficiality of several languages implies producing several versions of public information. Machine translation between all the languages of the Iberian Peninsula and from them into English will allow for a better integration of Iberian linguistic communities among them and inside Europe. The purpose of this paper is to show a machine translation system from Spanish to Catalan that deals with text input. In our approach, both deductive (linguistic) and…
Prosodic phenomena in simultaneous interpreting
2005
This paper reports on an empirical study on prosody in English-German simultaneous interpreting. It discusses prosody with particular reference to its tonal, durational and dynamic features, such as intonation, pauses, rhythm and accent, as well as its main functions, i.e. structure and prominence. Following a review of previous studies on the topic, a conceptual approach for the analysis of prosody in terms of structure and prominence is developed and subsequently applied to an authentic corpus of professional simultaneous interpretation consisting of three German versions of a 72-minute English source text. Prosodic patterns in the corpus are analyzed by means of a computer-aided method u…
Register Variation Across English Pharmaceutical Texts: A Corpus-driven Study of Keywords, Lexical Bundles and Phrase Frames in Patient Information L…
2013
Abstract This study constitutes an initial step towards filling a gap in corpus linguistics studies of linguistic and phraseological variation across English pharmaceutical texts, in particular in terms of recurrent linguistic patterns. The study conducted from a register- perspective ( Biber & Conrad, 2009 ), which employs both quantitative and qualitative research procedures, aims to provide a corpus-driven description of vocabulary and phraseology, namely key words, lexical bundles, and phrase frames, used in patient information leaflets and summaries of product characteristics (represented by 463 and 146 texts, respectively) written originally in English and collected in two domain-spec…
Intelligent Agents supporting user interactions within self regulated learning processes
2010
The paper focuses on the main advantages in the defnition and utilization of an open and modular e-learning software platform to support highly cognitive tasks performed by the main actors of the learning process. We present in detail the integration inside the platform of two intelligent agents devoted to talking with the student and to retrieving new information sources on the Web. The process is triggered as a reply to the system’s perception that the student feels discontented with the presented contents. The architecture is detailed, and some conclusions about the growth of the platform’s overall performance are expressed.
Semantics driven interaction using natural language in students tutoring
2007
The aim of this work is to introduce a semantic integration between an ontology and a chatbot in an Intelligent Tutoring Systems (ITS) to interact with students using natural language. The interaction process is driven by the use of a purposely defined ontology. In the ontology two types of conceptual relations are defined. Besides the usual relations, which are used to define the domain's structure, another type of relation is used to define the navigation schema inside the ontology according to the need of managing uncertainty. Uncertainty level is related to student knowledge level about the involved concepts. In this work we propose an ITS for the Java programming language called TutorJ…
On parsing optimality for dictionary-based text compression—the Zip case
2013
Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …
The language of emotion in short blog texts
2008
Emotion is central to human interactions, and automatic detection could enhance our experience with technologies. We investigate the linguistic expression of fine-grained emotion in 50 and 200 word samples of real blog texts previously coded by expert and naive raters. Content analysis (LIWC) reveals angry authors use more affective language and negative affect words, and that joyful authors use more positive affect words. Additionally, a co-occurrence semantic space approach (LSA) was able to identify fear (which naive human emotion raters could not do). We relate our findings to human emotion perception and note potential computational applications.